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ABSTRACT 

Protein modification is an extremely important 
post-translational regulation that adjusts the 
physical and chemical properties, conformation, 
stability and activity of a protein; thus altering 
protein function. Due to the high throughput of 
mass spectrometry (MS)-based methods in identify- 
ing site-specific post-translational modifications 
(PTMs), dbPTM (http://dbPTM.mbc.nctu.edu.tw/) is 
updated to integrate experimental PTMs obtained 
from public resources as well as manually curated 
MS/MS peptides associated with PTMs from 
research articles. Version 3.0 of dbPTM aims to be 
an informative resource for investigating the sub- 
strate specificity of PTM sites and functional asso- 
ciation of PTMs between substrates and their 
interacting proteins. In order to investigate the sub- 
strate specificity for modification sites, a newly de- 
veloped statistical method has been applied to 
identify the significant substrate motifs for each 
type of PTMs containing sufficient experimental 
data. According to the data statistics in dbPTM, 
>60% of PTM sites are located in the functional 
domains of proteins. It is known that most PTMs 
can create binding sites for specific protein- 
interaction domains that work together for cellular 
function. Thus, this update integrates protein- 
protein interaction and domain-domain interaction 
to determine the functional association of PTM sites 



located in protein-interacting domains. Additionally, 
the information of structural topologies on trans- 
membrane (TM) proteins is integrated in dbPTM in 
order to delineate the structural correlation between 
the reported PTM sites and TM topologies. To facili- 
tate the investigation of PTMs on TM proteins, the 
PTM substrate sites and the structural topology are 
graphically represented. Also, literature information 
related to PTMs, orthologous conservations and 
substrate motifs of PTMs are also provided in the 
resource. Finally, this version features an improved 
web interface to facilitate convenient access to the 
resource. 



INTRODUCTION 

Protein post-translational modification (PTM) plays an 
essential role in various cellular processes that adjusts 
the physical and chemical properties, folding, conform- 
ation, stabihty and activity of proteins; thus altering 
protein function (1). More than 200 different types of 
PTMs have been identified by mass spectrometry (MS)- 
based proteomics (2). The biological functions of this 
ubiquitous regulatory mechanisms include phosphoryl- 
ation for signal transduction, attachment of fatty acids 
for membrane anchoring and association, glycosylation 
for changing protein half-life, targeting substrates, promo- 
tion of cell-cell and cell-matrix interactions, acetylation 
and methylation of histone for gene regulation and 
ubiquitylation for protein degradation (3). With the 
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high-throughput MS or MS/MS-based methods in prote- 
omics, several databases associated with a specific modifi- 
cation type have been estabUshed. Phospho.ELM (4), 
Phosphorylation Site Database (5), PhosphoSitePlus (6), 
PHOSIDA (7) and PhosPhAt (8) were developed for 
accumulating experimentally verified phosphorylation 
sites. NetworKIN (9) and RegPhos (10) designed an inte- 
grative method to identify the kinase-substrate phosphor- 
ylation networks. O-GLYCBASE (11) and dbOGAP (12) 
are the databases of glycoproteins, most of which include 
experimentally verified 0-hnked glycosylation sites. 
UbiProt (13) stores experimental ubiquitylated proteins 
and ubiquitylation sites, which are implicated in protein 
degradation through an intracellular ATP-dependent pro- 
teolytic system. PupDB (14) is a prokaryotic ubiquitin-like 
protein (Pup) database which stores a collection of experi- 
mentally identified pupylated proteins and pupylation 
sites from published articles. It also integrates the infor- 
mation of pupylated proteins with corresponding struc- 
tures and functional annotations. An increasing number 
of proteomic studies have suggested that protein 
S-nitrosylation plays important role in the nitric oxide 
(NO)-related redox pathway. With this, a new database 
named dbSNO (15) was established by manually curating 
^-nitrosylation peptides from research articles. 

With regard to public resources of multiple PTM types 
currently available, UniProtKB/Swiss-Prot (2,16) includes 
as much information of PTMs as is available with func- 
tional and structural annotations. SysPTM (17) has 
designed a systematic platform for multi-type PTM 
research and data mining. Additionally, Human Protein 
Reference Database (HPRD) (18) contains a wealth of 
information relevant to the function of human proteins 
in health and disease, as well as the annotation of 
PTMs. With the importance of protein modifications in 
biological processes, we have previously proposed dbPTM 
(19) which integrates published databases in order to 
obtain experimentally validated protein modifications, as 
well as putative PTM substrate sites predicted by a series 
of accurate computational tools (20-22). Version 2.0 of 
dbPTM was extended to a knowledge base comprising 
the modified sites, solvent accessibility of substrate, 
protein secondary and tertiary structures, protein 
domains and protein variations (23). 

Due to the high throughput of MS/MS-based methods 
in identifying site-specific PTMs, this version (dbPTM 3.0) 
not only integrates experimental PTMs from public re- 
sources but also manually curates MS/MS peptides 
associated with PTMs from research articles using a text 
mining approach. The dbPTM 3.0 aims to be an inform- 
ative resource for investigating the substrate specificity of 
PTM sites and functional association of PTMs between 
substrates and their interacting proteins. In order to inves- 
tigate the substrate specificity for modification sites, a 
newly developed method, MDDLogo (24), has been 
applied to identify the significant substrate motifs for 
each type of PTMs. According to the data statistics in 
dbPTM, >60% of PTM sites are located in protein func- 
tional domains. Many PTMs can create binding sites for 
specific protein-interaction domains that work together 
for cellular function and read the state of proteome to 



cellular organization (25). Thus, this update integrates 
both protein-protein interaction (PPI) and domain- 
domain interaction information to determine the 
functional association of PTM sites located in protein- 
interacting domains. Additionally, in order to delineate 
the structural correlation between the reported PTM 
sites and transmembrane (TM) topologies, the informa- 
tion of structural topologies on TM proteins is integrated 
in dbPTM 3.0. To facilitate the investigation of PTMs on 
TM proteins, PTM sites as well as the structural topology 
of TM proteins are graphically represented. Furthermore, 
the web interface is enhanced to facilitate access to the 
resource and is now freely accessible at http://dbPTM. 
mbc.nctu.edu.tw/. 

IMPROVEMENTS 

The highhghted improvements and advances in dbPTM 
3.0 are presented in Figure 1 including data integration 
from public PTM resources and research articles, investi- 
gation of PTM substrate site specificity, investigation of 
PTM-associated protein interactions, as well as the inves- 
tigation of the effects of PTM on TM proteins. To facili- 
tate the study of PTMs and their functions, the web 
interface is redesigned and enhanced. Published literature 
information related to PTMs, orthologous conservations 
and substrate motifs of PTM sites are also provided in this 
onhne resource. The details of each improved process are 
depicted as foUows. 

Data integration from public PTM resources and research 
articles 

Supplementary Figure SI shows the detailed system flow 
of the construction of dbPTM 3.0. Due to the inaccess- 
ibihty of database contents in several onhne PTM 
resources, a total 11 biological databases related to 
PTMs are integrated in dbPTM, including UniProtKB/ 
Swiss-Prot (2), version 9.0 of Phospho.ELM (4), 
PhosphoSitePlus (6), PHOSIDA (26), version 6.0 of 
O-GLYCBASE (11), dbOGAP (12), dbSNO (15), 
version 1.0 of UbiProt (13), PupDB (14), version 1.1 of 
SysPTM (17) and release 9.0 of HPRD (27). A brief de- 
scription and the data statistics of the integrated databases 
are given in Supplementary Table SI. To solve the hetero- 
geneity among the data collected from different sources, 
the reported modification sites are mapped to the 
UniProtKB protein entries using sequence comparison. 
With the high throughput of MS-based methods in 
post-translational proteomics, this update also includes 
manually curated MS/MS-identified peptides associated 
with PTMs from research articles through a literature 
survey. First, a table list of PTM-related keywords is con- 
structed by referring to the UniProtKB/SwissProt PTM 
fist (http://www.uniprot.org/docs/ptmhst.txt) and the an- 
notations of RESID (28). Then, all fields in the PubMed 
database are searched based on the keywords of the con- 
structed table list. This is then foUowed by downloading 
the full text of the research articles. For the various 
experiments of proteomic identification, a text-mining 
system is developed to survey full-text literature that 
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Data integration from PTM resources 
and research articles 

MS/MS-identified 



Investigation of PTM substrate site 
specificity 




Figure 1. The highlighted improvements and advances in dbPTM 3.0. 



potentially describes the site-specific identification of 
modified sites. Approximately 800 original and review 
articles associated with MS/MS proteomics and protein 
modifications are retrieved from PubMed (July 2012). 
Next, the full-length articles are manually reviewed for 
precisely extracting the MS/MS peptides along with the 
modified sites. FurtheiTnore, in order to determine the lo- 
cations of PTMs on a full-length protein sequence, the 
experimentally verified MS/MS peptides are then 
mapped to UniProtKB protein entries based on its 
database identifier (ID) and sequence identity. In the 
process of data mapping, MS/MS peptides that cannot 
ahgn exactly to a protein sequence are discarded. 
Finally, each mapped PTM site is attributed with a cor- 
responding hterature (PubMed ID). 

Detection of PTM substrate site specificities 

Due to the difficulty of detecting the conserved motifs for 
a specific PTM with a large data size, MDDLogo (24) was 
used to identify the substrate motifs for each type of 
PTMs containing >500 modified peptides. MDDLogo 
exploits maximal dependence decomposition (MDD) in 
order to discover conserved motifs from a group of 
ahgned signal sequences. MDD groups a set of aligned 
signal sequences into subgroups that capture the most sig- 
nificant dependencies between positions. MDD adopts 



Chi-squared test )(^{Ai,Aj) to evaluate the dependence of 
amino acid occurrence between two positions Aj and Aj 
that surround the PTM substrate sites. MDDLogo has 
demonstrated its effectiveness in identifying substrate 
motifs of plant and virus phosphorylation (29,30), 
as weU as the mouse ^-nitrosylation (31). In order to 
extract the motifs that have conserved biochemical 
property of amino acids when doing MDD, it categorizes 
the 20 types of amino acids into five groups such as 
aliphatic, polar and uncharged, acid, basic and aromatic 
groups, as shown in Supplementary Figure S2. An 
example of MDD clustering on 5-nitrosylation data 
shows that position —7 has the maximal depend- 
ence with the occurrence of basic amino acids, including 
lysine (k), arginine (r) and histidine (H). Subsequently, 
aU data can be divided into two subgroups: one has 
the occurrence of basic amino acids in position —7 
and the other does not have the occurrence of basic 
amino acids in position —7. The MDD clustering is a re- 
cursive process to divide the data sets into tree-like 
subgroups. 

Integration of protein domains, domain-domain 
interactions and PPIs 

Protein-interaction domains usually recognize short 
peptide motifs of a target protein but do not bind stably 
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until the peptides have the appropriate PTMs; this can 
create binding sites for specific protein-interaction 
domains that work together for cellular function and 
read the state of proteome to cellular organization (25). 
For instance, the SH2 domain can bind to phospho- 
tyrosine (pTyr)-associated peptides in a manner that 
depends on ligand phosphorylation and the motif of the 
flanking amino acids (32,33). Thus, this update integrates 
the information of protein functional domains and PPls to 
infer the PTM-dependent protein interactions. To investi- 
gate the preference of functional domains for PTM, this 
study refers to the domain annotations in InterPro (34). 
InterPro is an integrated resource, which was developed 
initially as a means of rationahzing the complementary 
efforts of the PROSITE (35), PRINTS (36), Pfam (37) 
and ProDom (38) databases, for providing protein 'signa- 
tures' such as protein families, domains and functional 
sites. For the information of experimentally verified 
PPIs, five databases including DIP (39), MINT (40), 
IntAct (41), HPRD (18) and STRING (42) are integrated 
in dbPTM (see Supplementary Table S2). Additionally, 
the domain-domain interactions of InterDom (43) are 
also integrated to determine the functional association 
for the PTM sites which locate in protein-interacting 
domains. 

Integration of TM proteins with structural topology 

TM proteins play crucial roles in various cellular processes 
(44). A genome-wide study has discovered that ~20-30% 
of the proteins encoded by a typical genome are TM 
proteins (45). However, due to the experimental 
difficulties in obtaining high-quahty structures, TM 
proteins are notably under-represented in Protein Data 
Bank (46). The biological roles of PTMs playing on TM 
proteins include phosphorylation for signal transduction 
and ion transport, acetylation for structure stabihty, at- 
tachment of fatty acids for membrane anchoring and as- 
sociation, as well as the glycosylation for receptors 
targeting, cell-cell interactions and virus infection 
(44,47). With the importance of PTMs functioning on 
TM proteins, the experimentally curated information of 
membrane topologies is collected from TMPad (48), 
TOPDB (49), PDB_TM (50) and OPM (51). In order to 
provide a comprehensive investigation of TM proteins, a 
potential set of TM proteins is extracted from UniProtKB 
(52) by choosing protein entries which contain the 
keyword 'TRANSMEM' in feature ('FT') fine, the local- 
ization of 'membrane' and the information of TM 
topology. The potential TM proteins are further filtered 
using a TM prediction program MEMSAT (53) to deter- 
mine its membrane topologies. As shown in 
Supplementary Table S3, the filtering process resulted in 
2216 experimental and 43 142 potential TM proteins with 
membrane topologies. To facilitate the investigation of 
PTMs on TM proteins, the structural topology of TM 
proteins is graphically represented using PHP GD 
hbrary, as well as the PTM substrate sites. Moreover, 
the tertiary structures of TM proteins and PTM sites are 
visualized using the Jmol program (54). 



Integration of external biological databases 

For a given protein, the basic biological functions can be 
obtained from the annotations of UniProtKB. To provide 
more information about protein functional and structural 
annotations relevant to the modified proteins and the 
PTM substrate sites, the data contents of Gene 
Ontology (GO) (55), Protein Data Bank (PDB) (46) and 
Clusters of Orthologous Groups (COGs) (56) have been 
integrated in dbPTM. In this study, the information re- 
garding the molecular function, cellular components and 
biological process for a modified protein can be accessed 
by a crosshnk that refers to the corresponding entry from 
QuickGO (57) via a UniProtKB accession number. In 
order to facilitate the investigation of structural character- 
istics surrounding the PTM substrate sites, protein tertiary 
structure obtained from PDB was graphically presented 
by Jmol program. For proteins with tertiary structures 
(5% of UniProtKB/Swiss-Prot proteins), the protein 
structural properties, such as solvent accessibihty and sec- 
ondary structure of residues, were calculated by DSSP 
(58). With respect to the previous studies investigating 
the structural characteristics of PTMs (59-61) in 
proteins without known tertiary structures, two effective 
tools, RVP-net (62) and PSIPRED (63), are used to 
predict the solvent accessibility and secondary structure, 
respectively. In order to observe whether a PTM sites 
located in the conserved regions among orthologous 
protein sequences, the COGs of proteins were integrated 
and the ClustalW (64) program was adopted to implement 
the alignment of multiple protein sequences in each COG 
cluster. 

DATA CONTENT AND UTILITY 

Data statistics of the integrated PTM sites 

In order to provide the most comprehensive data of 
PTMs, this update not only integrates experimental 
PTMs from 11 external PTM-related resources but also 
manually curates MS/MS peptides associated with PTMs 
from ~800 research articles. After removing the redun- 
dancy data among these heterogeneous resources, there 
are totally 208 521 experimental PTM sites in dbPTM 
3.0. All the experimental PTM sites are further categorized 
by PTM types and the number of non-redundant PTM 
sites is calculated. As the data statistics of representative 
PTM types shown in Table 1, protein phosphorylation 
contains the most abundant data of experimentally 
verified substrate sites. Due to the high throughput of 
Ms/MS-based proteomics in the site-specific identification 
of modified peptides, several PTMs have a significantly 
increasing number of experimental data, including protein 
ubiquitylation, acetylation, methylation, N-linked 
and O-linked glycosylation, as well as the emerging 
5-nitrosylation. In addition to the experimental PTM 
sites, UniProtKB/Swiss-Prot provides putative PTM 
sites by using sequence similarity or evolutionary poten- 
tial, which are annotated as 'by similarity', 'potential' or 
'probable' in the 'MOD_RES' fields. A total of 226 122 
putative sites for all PTM types are integrated in dbPTM. 
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Table 1. Data statistics of experimental and putative PTM sites in dbPTM 
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Moreover, a KinasePhos-like method (19-22) has been 
adopted to construct the profile hidden Markov models 
(HMMs) for 18 types of PTM. Especially in protein phos- 
phorylation, >70 kinase-specific prediction models are 
constructed and used to identify the putative phosphoryl- 
ation sites with their kinases. These models were applied 
to search the potential PTM sites against UniProtKB/ 
Swiss-Prot protein sequences. As given in Table 1, 
totally 2 509267 putative sites for all PTM types are 
detected by HMMs with 90% predictive specificity. All 
the experimental PTM sites and putative PTM sites are 
available and downloadable in the web interface. 

Enhanced web interface 

To facilitate the use of the dbPTM resource, the web inter- 
face has been redesigned and enhanced to allow efficient 
access to the protein of interest. Supplementary Figure S3 
shows the content of a typical dbPTM query: (i) quick 
search by IDs and keywords, (ii) basic information, 
(iii) graphical visualization of PTM sites with structural 
characteristics and functional domains, (iv) table of 
experimental PTM sites with reported literature, 

(v) orthologous conservation of PTM substrate sites, 

(vi) PPls and domain-domain interactions and (vii) htera- 
ture related to PTMs. The combined visualization of PTM 
sites and function domains for a protein sequence can help 
users to understand the functional associations of PTM 



substrate sites. According to the multiple sequence align- 
ment result of orthologous proteins, users can investigate 
whether a PTM site located in evolutionary conserved 
regions, which indicates that the orthologous sites in 
other species could be involved in the same modification. 
Additionally, this update incorporates the protein func- 
tional domains and domain-domain interactions to infer 
the PTM-dependent protein interactions. Moreover, the 
hteratures associated with PTMs are categorized by the 
modification type. 

In addition to the database query by the protein name, 
gene name, UniProtKB ID or accession, the protein 
sequence is allowed for homology search against 
UniProtKB protein sequence database using Blast (65) 
program. For browse function of dbPTM web site, a 
summary table of PTM types and their modified 
residues is provided for users to efficiently access the 
number of data in a specific modified amino acid of a 
PTM type. The annotations of PTM types are referred 
to the UniProtKB/Swiss-Prot PTM hst (http://www. 
uniprot.org/docs/ptmhst.txt). As depicted in Supplemen- 
tary Figure S4, the acetylation of lysine (K) is chosen to 
obtain more detailed information such as the location of 
the modification in protein sequence, the modified 
chemical formula, the mass difference and the substrate 
site specificity, which is the preference of amino acids 
surrounding the modification sites. The structural 
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characteristics, such as solvent accessibility and secondary 
structure surrounding the PTM substrate sites, are also 
provided. Additionally, the substrate site specificity of 
the acetylated lysines is investigated in detail with refer- 
ence to the subcellular localizations of acetylated proteins. 
Previous work has demonstrated that the co-localization 
of acetyltransferases and substrate proteins could be a 
promising method to investigate the substrate site 
specificities and could be adopted to improve the compu- 
tational identification of protein acetylation sites (66). 

Investigation of PTM substrate site specificities 

Given a window length, «, the fragment of 2n + 1 residues 
centering on PTM site (position 0) is extracted and the 
positional frequencies of amino acids are calculated and 
presented as sequence logos by WebLogo (67). 
Supplementary Figure S5 shows the substrate motif and 
structural characteristics of experimental phosphorylation 
sites. According to the kinase classification extracted from 
KinBase (http://kinase.com/) and RegPhos (10), the sub- 
strate site specificity of protein phosphorylation could be 



further categorized into >200 kinase groups. As given in 
Supplementary Figure S5, most of the kinase-specific sub- 
strate motifs have conserved amino acids surrounding the 
phosphorylation sites. For the PTMs other than phos- 
phorylation, there are no annotations of catalytic 
enzymes or transferases due to the experimental difficulty 
in identifying the catalytic enzymes for a specific PTM. 
Based on the basic concept of sequence conservation, a 
sequence logo could display the substrate motif for each 
PTM type with a group of aligned sequences. However, it 
is difficult to explore conserved motifs for large-scale 
sequence data; for instance, a sequence logo for all phos- 
phorylation data involved with various catalytic kinases 
fails to obviously present the kinase-specific substrate spe- 
cificity. Thus, for the PTM containing sufficient data of 
experimental substrate sites, MDDLogo was performed to 
cluster a group of aligned substrate sequences into sub- 
groups containing statistically significant motifs. As the 
example of protein 5-nitrosylation presented in Figure 2, 
10 sequence logos, which were identified from 3095 
S-nitrosylated peptides with a 13-mer window length, 
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Figure 2. The MDDLogo-identified substrate motifs of protein S-nitrosylation sites. 
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contain a conserved motif of positively ciiarged amino 
acids (K, R and H) surrounding the ^-nitrosocysteine. 
Interestingly, the first and sixth groups contain the 
conserved motifs of negatively charged amino acids (D 
and E) accompanied by positively charged amino acids 
at two specific positions. Consistent with previous 
studies (68-73), the 5'-nitrosylated cysteines may be 
located within an acid-base motif flanked by acidic and 
basic amino acids. 

Investigation of PTM-associated domains and protein 
interactions 

According to the data statistics in dbPTM, >60% of ex- 
perimentally verified PTM sites locate in the functional 
domains of proteins. Such statistics could be analyzed in 
detail for each type of PTMs. For instance of protein 
S-nitrosylation, which is an emerging PTM playing 
crucial role in the regulation of NO-related cellular 
processes, the statistics shows that ~70% of the reported 
5-nitrosylation sites locate within the functional domains. 
Furthermore, the detailed distribution of functional 
domains covering 5-nitrosylation sites is given in 
Supplementary Table S4. It is observed that the most 
preferred functional domain is the 'nucleotide-binding 
alpha-beta plait' with InterPro ID: IPRO 12677 which 



covers 47 5'-nitrosylation sites. Another preferred func- 
tional domain is the 'RNA recognition motif, RNP-l' 
domain with InterPro ID: IPR000504 which covers 46 
^-nitrosylation sites. This investigation indicates that 
these 5-nitrosylation sites may play important roles in 
the domains of proteins involving in DNA or RNA 
binding (74). In addition. Supplementary Table S5 
shows the distribution of functional domains covering 
substrate sites for several representative PTMs, including 
acetylation, methylation, hydroxylation, N-linked and 
0-linked glycosylation, phosphorylation and 
ubiquitylation. 

Many PTMs provide binding sites for specific protein- 
interaction domains, which often contain a conserved 
structure for the modified site and a more flexible 
surface for the flanking amino acids, synergize to 
regulate cellular processes (75-78). In order to investigate 
the PTM-associated protein interactions, the information 
of domain-domain interactions collected from InterDom 
is adopted in this study. As the case study of 'Histone H3' 
(UniProtKB ID: H31_HUMAN) presented in Figure 3, 
'Heterochromatin protein 1 homolog alpha' ('HPT, 
UniProtKB ID: CBX5_HUMAN) and 'WD repeat- 
containing protein 5" ('WDRS', UniProtKB ID: 
WDR5_HUMAN) interact with 'Histone H3\ When 
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Figure 3. A case study of domain-domain interactions and PTM-associated protein interactions on Histone H3 (UniProtKB ID; H3 INHUMAN). 
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investigating the protein interaction between 'HPl' and 
'Histone H3' in detail, there is a domain-domain inter- 
action between 'Chromodomain' (InterPro ID: 
1PR000953) and 'Histone H3' (InterPro ID: 1PR000164). 
Among the PTMs located in the domain of 'Histone H3', 
a previous study has demonstrated that the 'HPl 
chromodomain' can bind to the 'Histone H3' methylated 
at lysine 10 (79). Another protein interaction shows that 
there is a domain-domain interaction between the 'WD40 
Repeat' (InterPro ID: IPR001680) and 'Histone Core' 
(InterPro ID: IPR007125). It has been proposed that the 
structural motif for the specific recognition of methylated 
'Histone H3' lysine 5 by 'WD40 Repeat' of 'WDR5' is 
essential to vertebrate development (80,81). This investi- 
gation indicates that the other PTM sites could be the 
potential binding sites for protein-interaction domains. 

Investigation of PTM sites on TM proteins 

According to the data statistics of PTM sites and TM 
proteins in dbPTM, a total of 9644 and 68 775 PTM sub- 
strate sites locate on the 2088 experimental and 33 747 
potential TM proteins, respectively. In order to investigate 
the structural distribution of PTM sites on TM proteins, 
the structural topologies of a TM protein are mainly 



categorized into four types: extracellular, cytoplasmic, 
TM and unknown regions. Supplementary Table S6 pro- 
vides the structural distribution of PTMs containing >10 
substrate sites on experimental TM proteins. Interestingly, 
without the consideration of substrate sites located in 
unknown region, all of the N-hnked (GlcNAc . . .) glycosy- 
lation sites are located in the extracellular region, as well 
as the 0-hnked and C-hnked glycosylation sites. This in- 
vestigation is reasonable to understand the biological 
effect of glycosylation functioning on TM proteins for 
receptor targeting and cell-cell interactions (47). 
Otherwise, the phosphorylation sites are mainly located 
in cytoplasmic regions, which induce signal transduction 
and ion transport. The structural distribution of PTM 
sites could be the means to infer the potential roles of 
PTMs functioning on TM proteins. Actually, a previous 
work has demonstrated that the incorporation of 
membrane topology could improve the performance of 
predicting O-linked glycosylation sites on TM proteins 
(82). Supplementary Figure S6 shows a graphical 
visualization of the PTMs and membrane topology on 
human Beta-2 adrenergic receptor (ADRB2). Further- 
more, two modification sites Tyrl41 (pTyr) and Cys341 
(5'-palmitoyl cysteine) are further highhghted in red on the 
tertiary structure (PDB ID: 2R4R) using Jmol viewer. 



Table 2. Advances and improvements in this update (dbPTM 3.0) 



Features 



dbPTM 1.0 



dbPTM 2.0 



dbPTM 3.0 



Protein entry 

Experimental PTM 
resource 



Literatures related to 
PTMS 

Computationally predicted 
PTMs 

Protein tertiary structure 
Structural properties of 
PTM sites 

PTM annotation 

Kinase family annotation 
Protein functional domain 
Protein-protein interaction 
Domain-domain 

interaction 
Functional association of 

PTM 

PTM substrate motif 
Evolutionary conservation 

of PTM sites 
Transmeinbrane topology 
Graphical visualization 



UniProtKB/Swiss-Prot 
(release 46) 

UniProtKB/Swiss-Prot, 
Phospho.ELM and 
O-GLYCBASE 



Literature survey of PTMs - 



Phosphorylation, 

glycosylation and 

sulfation 
Protein Data Bank (PDB) 
Amino acid frequency 



RESID (373 PTM 
annotations) 



InterPro 



PTM, solvent accessibility, 
protein variation and 
protein domain 



UniProtKB/Swiss-Prot (release 
55) 

UniProtKB/Swiss-Prot, 
Phospho.ELM, PHOSIDA, 
HPRD, O-GLYCBASE and 
UbiProt 



Yes 

20 types of PTM 



Protein Data Bank (PDB) 
Amino acid frequency, solvent 

accessibility and secondary 

structure 
RESID (431 PTM annotations) 

KinBase 
InterPro 



WebLogo 
ClustalW 



PTM, solvent accessibility, sec- 
ondary structure, protein vari- 
ation, protein domain, tertiary 
structure, orthologous conser- 
vation and sequence logo 



UniProtKB release 2012-04 

UniProtKB/Swiss-Prot, HPRD, SysPTM, 
Phospho.ELM, PhosphoSitePlus, PHOSIDA, 
O-GLYCBASE, dbOGAP, dbSNO, UbiProt 
and PupDB 

>5000 modified peptides extracted from ~800 
articles 

Yes (categorized by PTM types) 
18 types of PTM 



Protein Data Bank (PDB) 
Amino acid frequency, solvent accessibility, sec- 
ondary structure and intrinsic disorder region 

RESID (431 PTM annotations) 

KinBase and RegPhos 

InterPro and InterProScan 

DIP, MINT, IntAct, HPRD and STRING 

InterDom 

PTM-associated domains and PTM-dependent 

protein interactions 
WebLogo and MDDLogo 
ClustalW and COG 

TMPad, PDBTM, TOPDB and OPM 
PTM, solvent accessibility, secondary structure, 
protein variation, protein domain, tertiary struc- 
ture, orthologous conservation, sequence logo, 
PTM substrate motifs, domain-domain inter- 
action, protein-protein interaction, transmem- 
brane topology and tertiary structure of PTMs 
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which indicates the solvent accessibihty and distance 
between them. 



CONCLUSION 

The expansion of the dbPTM database increases its use- 
fulness for researchers investigating the impact of PTMs 
on protein function and cellular processes. Additionally, 
the enhanced web interface enables both wet-lab biologists 
and bioinformatics researchers to efficiently explore the 
further information about protein PTMs. Table 2 sum- 
marizes the advancements and new features supported in 
dbPTM 3.0. In the future, we expect dbPTM to continue 
to grow with the increasing availability of data in re- 
sources such as Phospho.ELM, PhosphoSitePlus and 
UniProtKB. One area that we can envision dbPTM im- 
proving greatly in prospective works is implementing a 
more accurate method for the discovery of PTM substrate 
motifs. Also, enhancements on the text mining algorithm 
will enable the system to select MS/MS peptides from 
research articles associated with protein modifications 
with a higher confidence rate. In order to provide more 
adequate information for PTM function, the descriptions 
associated with the biological function of PTMs will be 
extracted from research articles using an information re- 
trieval system. Moreover, the thermodynamic parameters 
for proteins (83), PPIs (84) and protein-nucleic acid inter- 
actions (85) could be integrated for the investigation of 
PTM-associated protein stabihty. 



AVAILABILITY 

The data content of dbPTM will be regularly maintained 
and semiannually updated. The resource is now available 
at http://dbPTM.mbc.nctu.edu.tw/. 
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